(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
5 April 2001 (05.04.2001) 




PCT 



(10) International Publication Number 

WO 01/24526 Al 



(51) International Patent Classification 7 : H04N 7/173 

(21) International Application Number: PCT/EPOO/09078 

(22) International Filing Date: 

15 September 2000 (15.09.2000) 



(25) Filing Language: 

(26) Publication Language: 



English 
English 



(30) Priority Data: 

09/406,642 27 September 1999 (27.09.1999) US 

(71) Applicant: KONINKLIJKE PHILIPS ELECTRON- 
ICS N.V. [NUNL]; Groenewoudseweg I, NL-5621 BA 
Eindhoven (NL), 



(72) Inventor: MALLART, Raoul; Prof. Holstlaan 6, 
NL-5656 AA Eindhoven (NL). 

(74) Agent: GRAVENDEEL, Cornells; Internationaal 
Octrooibureau B.V., Prof Holstlaan 6, NL-5656 AA Eind- 
hoven (NL). 

(81) Designated States (national): CN, JP, KR. 

(84) Designated States (regional): European patent (AT, BE, 
CH, CY, DE, DK, ES, H, FR, GB, GR, IE, IT, LU, MC, 
NL, PT, SE). 

Published: 

— With international search report. 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations " appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



(54) Title: SCALABLE SYSTEM FOR VIDEO-ON-DEMAND 




(57) Abstract: A VOD service is emulated in an NVOD architecture. Content information is made available to an end-user in the 
NVOD architecture. An introductory portion of the content information is stored at the end-user's equipment, e.g., by downloading 
overnight. During playing out of the introductory portion at the end-user enabling the content information supplied in the NVOD 
architecture is buffered at the end-user's equipment. The equipment is controlled to switch from playing out the introductory portion 
stored to playing out the buffered content information. 



WO 01/24526 

1 

Scalable system for video-on-demand. 
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FIELD OF THE INVENTION 

The invention relates in particular to a system and method to implement a 
video-on-demand (VOD) service using any transmission network such as cable TV and the 
Internet. 

5 

BACKGROUND ART 

A VOD delivery system is a system for giving program information through 
bidirectional transmission paths between a supply center and subscribers. In a wider sense, the 
VOD delivery system handles multimedia information including still images, high-quality 

10 television images, computer software, etc. However, the term M VOD delivery system" is often 
used in a narrower sense as a system for handling movies, television programs, etc. Typically, 
therefore, the term Video-on-demand (VOD) refers to a service that enables subscribers to 
select videos from a central server for viewing on a television or the display monitor of a PC. 
Owing to the large amounts of data required by video, VOD via a data network such as the 

1 5 Internet, does not scale to a large number of users. VOD requires huge network bandwidth and 
huge servers. Near- VOD (NVOD) is a solution to the scalability problem of VOD. But users 
do not get true control over the video: they neither control the starting time, nor can they pause 
the program or rewind the video. In NVOD programming, the interactive entertainment 
system broadcasts several time-shifted versions of an interactive application (i.e., broadcasts 

20 duplicate versions of the application, with the starting time of each version offset by a unique, 
predetermined time increment) to all of its subscribers over shared communication paths. 
Typically, interactive systems utilize NVOD services to provide several presentations of a 
movie, where the presentation start-times are staggered so that no two presentations start at the 
same time. 

25 Published European Patent Application EP 0 749 242 Al describes a VOD 

system that has a server which operates in the near-video-on-demand (NVOD) mode. The 
server transmits multiple copies of each program via multiple, separate transmission NVOD 
channels. The transmission of a specific program via one NVOD channel is offset in time by a 
fixed time interval relative to transmission of the same program via another NVOD channel. 
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As a result, different NVOD channels show different stages of evolution of the same program. 
Upon a request from a client for receipt of a particular program, the server sends to the 
requesting client the beginning portion of the program via a VOD channel that is not an 
NVOD channel. The beginning portion provided has a duration equal to or shorter than the 

5 stagger time interval. The client is controlled to start recording the program in progress on a 
specific one of the NVOD channels. The specific NVOD channel is the one on which the 
transmission of the program started the shortest time period ago relative to the request. This 
ensures an overlap in information content, from a certain moment on, between the recording 
and the play-out of the beginning portion. The overlap enables an, in principle, seamless 

1 0 transition in switching from the VOD channel to the specific NVOD channel. 

A disadvantage of the known system is that it is not scalable to the number of 
users. Each request for a beginning portion needs a separate VOD channel to the end user, on 
top of the number of NVOD channels to implement the NVOD mode. 

1 5 SUMMARY OF THE INVENTION 

An object of the invention is to provide a novel method of emulating a VOD 
service in an NVOD architecture. Another object is to provide a method that allows for 
scalability. In the method of the invention content information is being made available to an 
end-user in the NVOD architecture. An introductory portion of the content information is 

20 enabled to be stored at the end-user's equipment. During play-out of the introductory portion 
at the end-user the content information supplied via the NVOD architecture is enabled to be 
buffered at the equipment. The method further enables to switch from playing out the stored 
introductory portion to playing out the buffered content information. In a more specific 
embodiment, the content information comprises multiple programs, and the introductory 

25 portion comprises respective ones of multiple introductory parts associated with respective 
ones of the multiple programs. The switching now comprises shifting from playing out a 
specific introductory part to playing out a specific program associated with the specific 
introductory part. 

The inventor proposes to store at the user's client the beginning portions of all 
30 different programs available in the NVOD mode. The user can thus surf the channels at his/her 
own storage device, that preferably comprises a hard disk, and select a program based on the 
stored portions. Once a portion is selected and is being played out, recording is started of the 
relevant NVOD channel nearest in time for that particular program. The local storage of the 
introductory portions makes the system scalable. The introductory portions could be stored, 
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e.g., overnight while recording from a certain TV channel, or could be downloaded via a data 
network, e.g., the Internet, or could be provided stored on a physical device, e.g., a DVD or a 
card with a solid state semiconductor memory, e.g., a flash memory card. 

Streaming of video over IP networks requires some buffering of the data. This 
5 buffering is required to minimize the adverse effects of the low quality of service of IP 
networks. This delays start-up time. The invention leverages local mass storage capability, 
e.g., on a set top box (STB) and uses a NVOD service to emulate a VOD service. Data 
broadcasting stores the beginning of all movies in a collection on the local mass storage device 
of the set top box. When the user chooses to watch a specific movie, the STB starts playing the 

1 0 movie from the local storage. In parallel, the STB tunes to the proper NVOD channel and 
starts buffering the rest of the movie. The portion of the movie stored locally needs not be 
longer in duration than the staggering interval of the NVOD service. The STB can provide an 
instantaneous start of the movie play back. Since the STB buffers the video stream, the user 
can pause and rewind the movie as if he/she was controlling a VCR. Of course, fast- 

1 5 forwarding past the current buffer content is not possible. If the buffer is a recirculating buffer, 
i.e., the stored content is overwritten by new content each time the buffer is full, a fast rewind 
past the overwritten content is not possible either. 

User profiling and personalization can be used to automate the selection process 
of which movies to store on the local mass storage device. 

20 The apparatus of the user's equipment storing the introductory portion may, but 

need not, be the same apparatus as the one buffering the content information supplied via the 
NVOD. If these are two separate apparatuses, it is required that they cooperate to minimize the 
perceivable effect in terms of an interruption when switching from the one to the other for 
playing out. If they are one and the same storage device, it is required that it allows read and 

25 write operations at the same time. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention is explained by way of example and with reference to the 
accompanying drawings, wherein: 
30 Fig. 1 is a block diagram of an NVOD server system in the invention; 

Figs.2 and 3 are block diagrams of a client for connection to the system of 

Fig.l; and 

Fig.4 is a block diagram of some details of the client. 
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Throughout the figures, same reference numerals indicate similar or 
corresponding features. 

PREFERRED EMBODIMENTS 
5 First, an environment for analog video is discussed in detail. Then a system for 

digital video is discussed. 

Fig.l is a block diagram of an NVOD server system 100 for delivery of analog 
video and audio. System 100 comprises an analog Audio/Video (A/V) source 102, e.g., a 
video tape recorder (VTR), a frame counter 104, a data insertion sub-system 106 and a storage 

1 0 sub-system 1 08. A/V source 1 02 supplies analog A/V program content. Counter 1 04 counts 
the frames in the introductory portion of the A/V content, and sub-system 106 adds the frame 
number as data to the VBI (vertical blanking interval) after the associated frame. Alternatively, 
a flag is inserted in the VBI only after the last frame, or one of the last frames of the beginning 
portion of the A/V program content. The flag then indicates that the switching should occur. 

1 5 Having a flag after a frame preceding the last frame is advantageous, for example, to take into 
account the latency in the control process of the switching. The A/V content including the 
labeled frames of the introductory portion of each A/V program is then stored in storage 
system 108. System 100 further comprises a program information generator 1 10, a download 
scheduler 1 12, a sub-system 1 14 for data insertion in the VBI, and an interface 1 16 to a 

20 delivery network 1 1 8. Generator 1 10 generates program information under control of 
download scheduler 1 12. The program information includes the number of frames in the 
introductory portion, i.e., the number of labeled frames and, for example, the title of the 
program, the name of the content owner and the name of the author. Scheduler 1 12 determines 
how frequent and when the introductory portion of each program is being made available to 

25 delivery network 1 1 8 for being stored at the client (not shown here), e.g., each night from 3am 
to 4am. Sub-systern 1 14 inserts the information generated in generator 1 10 into the VBI at the 
beginning of the introductory portion of each program to be downloaded to the client via 
network 118. 

Fig.2 is a block diagram of a client 200 for receipt of the introductory portions 
30 of the A/V content programs as downloaded from system 100 for local storage. Client 200 
comprises an interface 202, a VBI parser 204, a memory 206, A/D converters 208 and 210, 
MPEG encoders 212 and 214, a multiplexer 216 and a memory 218. 

Interface 202 couples client 200 to delivery network 1 1 8 and supplies an audio 
path and a video path. Parser 204 is connected to the video path and parses the VBI 
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information. Parser 204 extracts the program information, such as the program's title, the 
names of the content owner and of the author, etc., as generated by generator 110. Parser 204 
further extracts the number of labeled video frames of the introductory portion of the program, 
and the frame number currently received. The program information and the total number of the 

5 frame label is stored in a data base in memory 206. A/D converter 208 converts the analog 
video signal received over network 118 into a digital video signal. The digital video signal is 
suppled to MPEG 212 encoder for compressing the digital video signal. A/D converter 210 
encodes the analog audio signal received over network 118 into a digital audio signal. The 
digital audio signal is compressed by MPEG encoder 214. The digital compressed audio and 

1 0 video signals are then stored in memory 2 1 8 via multiplexer 2 1 6. 

Fig. 3 is a block diagram of client 200 of Fig.2 showing components for playing 
out the content. In addition to the components introduced in Fig.2, client 200 comprises a 
buffer 302, a switch 304 with an output 306, a controller 308 and a frame counter 310. When 
the user starts playing out the introductory part of a specific A/V content program that is 

1 5 stored in local storage 218, switch 304 is the position wherein output 306 is connected to 
storage 218. Counter 310 keeps track of the frames supplied by storage 218. Controller 308 
receives a signal representative of the number of frames supplied by storage 218. Controller 
also receives from data base 206 the total number of frames comprised in the introductory part 
stored. Controller 308 compares the frame number received from counter 310 with the total 

20 number of frames as stored for this introductory part. If the controller determines that the 
numbers are equal switch 304 is controlled so as to connect to buffer 302. 

Buffer 302 receives the digitized compressed content as supplied by system 100 
in the NVOD mode and is received at client 200 via A/D converters 208 and 210 and encoders 
212 and 214. The size of buffer 302 is determined by the stagger time interval of the NVOD 

25 that in turn determines the length needed for the introductory portion. When the user starts 
playing out the introductory portion of a program from local storage 218, system 100 selects 
among the NVOD programs on delivery network 218 that specific channel that started 
supplying the same program the shortest time ago. Such information for controlling that 
selection is, for example, available from data base 206. Assume that the last frame of the 

30 introductory portion is being supplied from local storage 218. The same frame is now 

available somewhere in buffer 302 because an overlap in content is required. All frames of the 
introductory portion have been tagged or labeled. Controller 308 therefore controls buffer 302 
to start supplying the frame after the last tagged one. 
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Fig. 4 shows switch 304 in more detail. Switch 304 comprises demultiplexers 
402 and 404, audio decompressors 406 and 408, an audio mixer 410, a D/A converter 412, a 
video path switch 414, a decoder 41 6 and a D/A converter 418. Demultiplexer 402 is 
connected to input A of switch 304 and demultiplexer 404 is connected to input B of switch 
5 304 (see Fig.3). Demultiplexers 402 and 404 each generate a video stream and an audio 
stream. 

The audio stream from demultiplexer 402 is supplied to a decompressor 406. 
The audio stream from demultiplexer 404 is supplied to a decompressor 408. The outputs of 
decompressors 406 and 408 are connected to data inputs of mixer 410. Mixer 410 is controlled 
10 by controller 308. The output of mixer 410 is connected to D/A converter 412 that supplies 
analog audio output. Mixer 410 is controlled to produce a glitch-free audio signal by, for 
example, phasing in the signal received from buffer 302 and phase out the audio signal from 
local storage 218. 

The video stream is switched in switch 414 between two frames from input A to 

1 5 input B. Switch 414 is also controlled by controller 308. This produces a continuous bitstream 
of digital video that is decoded in decoder 416, and then converted to analog in D/A converter 
41 8 for analog play-out. Preferably, the video encoding is such that the last frame is not a B- 
frame (bi-directional frame). That is, the decoding of the last frame of the introductory portion 
does not require a subsequent frame. Also, the encoding is preferably such that the first frame 

20 supplied by buffer 302 is an I-frame (intra frame), whose decoding does not require a previous 
frame. These restrictions ensure that the output of switch 414 is a continuous video stream. 

The above example relates to analog content. The case of digital content is 
relatively simple. Digital video bitstreams have time stamps. They can therefore be cut at any 
frame transition and re-spliced using the proper time stamps as a reference in order to recreate 

25 exactly the same bitstream. In order to implement the VOD concept the time stamp of the end 
of the introductory portion of the program needs to be communicated to the client in the 
download phase. To splice the program content the client compares the stored time stamp (that 
indicates the end of the introductory portion) to the current time stamp of the bitstream being 
played out from the local storage. When the comparison signals that the time stamps are equal, 

30 the client switches the video source from local storage to the buffer. The configuration of the 
system for digital content is, for example, functionally similar to that of the analog case 
discussed above, but does not comprise A/D and D/A converters 208, 210, 412, and 418, does 
not comprise compressors 212 and 214 and decompressors 406 and 408, does not comprise the 
components for processing the frame numbers, and it does not need audio mixer 410 either. 
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Within the context of the curent invention, the following patent documents are 
incorporated herein by reference: 

- U.S. Serial No. 09/283,545 (attorney docket PHA 23,633) filed 4/1/99 for Yevgeniy Eugene 
Shteyn for TIME- AND LOCATION- DRIVEN PERSONALIZED TV. This document relates 

5 to a server system that enables a subscriber to select a specific broadcast program for 
recording and a specific location and time frame for play-out of the recorded program. 

- U.S. Serial No. 09/149,950 (attorney docket PHA 23,495) filed 9/9/98 for Raoul Mallart for 
REAL-TIME VIDEO GAME USES EMULATION OF STREAMING OVER THE 
INTERNET IN A BROADCAST EVENT. In a broadcast application on a client-server 

1 0 network the streaming is emulated of animation data over the Internet to a large number of 
clients. The animation is considered a sequence of states. State information is sent to the 
clients instead of the graphics data itself. The clients generate the animation data itself under 
control of the state information. The server and clients communicate using a shared object 
protocol. Thus, streaming is accomplished as well as a broadcast without running into severe 

1 5 network bandwidth problems. This is approach is used to map a real life event, e.g., a motor 
race, onto a virtual environment in order to let the user participate in a virtual race against the 
real life professionals, the dynamics of the virtual environment being determined by the state 
changes sent to the user. 

- U.S. Serial No. 09/138,782 (attorney docket PHA 23,491) filed 8/24/98 for Raoul Mallart 
20 and Atul Sinha for EMULATION OF STREAMING OVER THE INTERNET IN A 

BROADCAST APPLICATION. This document relates to a broadcast application on a client- 
server network wherein the streaming is emulated of animation data over the Internet to a large 
number of clients. The animation is considered a sequence of states. State information is sent 
to the clients instead of the graphics data itself. The clients generate the animation data itself 
25 under control of the state information. The server and clients communicate using a shared 
object protocol. Thus, streaming is accomplished as well as a broadcast without running into 
severe network bandwidth problems. 

- U.S. Serial No. 09/053,448 (attorney docket PHA 23,383) filed 4/1/98 for Raoul Mallart and 
Atul Sinha for GROUP-WISE VIDEO CONFERENCING USES 3D-GRAPHICS MODEL 

30 OF BROADCAST EVENT. This document relates to a TV broadcast service to multiple 
geographically distributed end- users that is integrated with a conferencing mode. Upon a 
certain event in the broadcast, specific groups of end users are switched to a conference mode 
under software control so that the group is enabled to discuss the event. The conference mode 
is enhanced by a 3D graphics model of the video representation of the event that is 
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downloaded to the groups. The end users are capable of interacting with the model to discuss 
alternatives to the event. 
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1 . A method of emulating a VOD service in an NVOD architecture (108), the 
method comprising: 

- making content information available to an end-user (200) in the NVOD architecture; 

- enabling to store (218) an introductory portion of the content information at the end-user's 
5 equipment ; 

- during playing out of the introductory portion at the end-user enabling to buffer (302) the 
content information supplied in the NVOD architecture at the end-user's equipment; 

- enabling to switch (304) from playing out the stored introductory portion to playing out the 
buffered content information. 

10 

2. The method of claim 1 , wherein: 

- the content information comprises multiple programs; 

- the introductory portion comprises respective ones of multiple introductory parts associated 
with respective ones of the multiple programs; and 

15 - the enabling to switch comprises enabling to shift from playing out a specific one of the 
introductory parts to playing out a specific one of the multiple programs associated with the 
specific introductory part. 

3. The method of claim 1, wherein: 

20 - the content information is broadcast via a data network (118); 

- the introductory portion is supplied via the data network. 

4. The method of claim 1 , wherein: 

- the content information is broadcast via a TV network; and 
25 - the introductory portion is supplied via the TV network. 

5. The method of claim 1, wherein: 

- the content information is broadcast via a TV network; and 

- the introductory portion is supplied via a data network. 



10 



6. The method of claim 1, wherein the introductory portion is provided to the end- 

user stored on a physical device. 

5 7. The method of claim 2, wherein: 

- the content information comprises frames in an analog format ; 

- the enabling to switch comprises labeling (104/106) successive ones of the frames of the 
introductory portion in a VBI signal; and 

- the switching is controlled by the labeling. 

10 

8. The method of claim 2, wherein: 

- the content information comprises frames in a digital format with time stamps; and 

- the switching is controlled by the time stamps. 

15 9. A client apparatus (200) for use in a client-server system that emulates a VOD 

service in an NVOD architecture, wherein: 

- the system has a server (108) that makes content information available to the client in an 
NVOD mode; 

- the client has a storage (218) for storing an introductory portion of the content information; 
20 - the client has a buffer (302) for buffering the content information supplied in the NVOD 

mode during playing out of the introductory portion; and 

- the client has a switch (304) to control switching from playing out the introductory portion 
from the storage to playing out the content information from the buffer. 

25 10. The client apparatus of claim 9, wherein: 

- the server makes available the content information in an analog format; 

- the introductory portion has frames with labels; 

- the client has a parser (204) to extract information about the labels; and 

- the client has a controller (308) to control the switching under control of the information. 

30 

1 1 - The client apparatus of claim 9, wherein: 

- the server makes available the content information in a digital format with time stamps; and 

- the client has a controller to control the switching under control of the time stamps. 
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